Goto

Collaborating Authors

 proximal policy optimization algorithm


Hybrid-Quantum Neural Architecture Search for The Proximal Policy Optimization Algorithm

Zada, Moustafa

arXiv.org Artificial Intelligence

Quantum machine learning (QML) has emerged in recent years as a promising approach to advance classical machine learning, considering that they can evaluate classically intractable functions [1] since they have properties that the classical computational models don't have access to like the exponential Hilbert space, entanglement, and the parallelistic nature of quantum computation in the presence of superposition. But, Since we are still in the NISQ era, with limited quantum computers, many studies have discussed the use of hybrid architectures in the hope that they can get an advantage from the currently existing quantum computers or if they could find a way to distribute the work harmonically and efficiently between the classical and the quantum part, each part with what they are good at. An important Distinguish that had to be made here is that when we are not considering variational quantum circuits as hybrid classical-quantum models, since they use the classical computer only to train and optimize the quantum circuit but they don't interfere with the actual learning and the inference of the model, instead, we treat the VQC itself as a layer, a quantum layer, in the neural network of the hybrid model along with classical layers with artificial neurons. In this paper, we apply the proximal policy optimization (PPO) algorithm to the classical CartPole environment problem to ascertain whether the quantum-enhanced PPO algorithm gives any advantage over the classical version. In particular, we use a genetic algorithm-esque version of quantum PPO to train the system. This paper is divided as follows: In Sec.2, we examine analogous studies to the experiments we have performed in order to gauge what has been done, and how we can improve upon it. In Sec.3, we provide a brief synopsis of the operation of the PPO algorithm.


Learning Curricula in Open-Ended Worlds

Jiang, Minqi

arXiv.org Artificial Intelligence

Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents. As collecting real-world interactions can entail additional costs and safety risks, the common paradigm of sim2real conducts training in a simulator, followed by real-world deployment. Unfortunately, RL agents easily overfit to the choice of simulated training environments, and worse still, learning ends when the agent masters the specific set of simulated environments. In contrast, the real world is highly open-ended, featuring endlessly evolving environments and challenges, making such RL approaches unsuitable. Simply randomizing over simulated environments is insufficient, as it requires making arbitrary distributional assumptions and can be combinatorially less likely to sample specific environment instances that are useful for learning. An ideal learning process should automatically adapt the training environment to maximize the learning potential of the agent over an open-ended task space that matches or surpasses the complexity of the real world. This thesis develops a class of methods called Unsupervised Environment Design (UED), which aim to produce such open-ended processes. Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments at the frontier of the learning agent's capabilities. Through extensive empirical studies and theoretical arguments founded on minimax-regret decision theory and game theory, the findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness and generalization to previously unseen environment instances. Such autocurricula are promising paths toward open-ended learning systems that achieve more general intelligence by continually generating and mastering additional challenges of their own design.


Mission schedule of agile satellites based on Proximal Policy Optimization Algorithm

Liu, Xinrui

arXiv.org Artificial Intelligence

Mission schedule of satellites is an important part of space operation nowadays, since the number and types of satellites in orbit are increasing tremendously and their corresponding tasks are also becoming more and more complicated. In this paper, a mission schedule model combined with Proximal Policy Optimization Algorithm(PPO) is proposed. Different from the traditional heuristic planning method, this paper incorporate reinforcement learning algorithms into it and find a new way to describe the problem. Several constraints including data download are considered in this paper.